A High-performance Plagiarism Detection System - Notebook for PAN at CLEF 2011

نویسندگان

  • Neil Cooke
  • Lee Gillam
  • Peter Wrobel
  • Henry Cooke
  • Fahad Al-Obaidli
چکیده

In this paper we report on our high-performance plagiarism detection system which is able to process the PAN plagiarism corpus for the external plagiarism detection task within relatively short timescales in contrast to previously reported state-of-the-art, and still produce a reasonable degree of performance (PAN 11, 4 place, PlagDet=0.2467329, Recall=0.1500480, Precision=0.7106536, Granularity=1.0058894). At the core of our system is a simple method which avoids the use of hash-type approaches, but about which we are unable to disclose too many details due to a patent application in progress. We optimised our performance using the PAN10 collection, and used the best parameters for the final submission. We anticipated a relatively similar performance at PAN11, modulo changes to the plagiarism cases, and 4 place this year put us between participants who had been 5 and 6 in PAN 10.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

External & Intrinsic Plagiarism Detection: VSM & Discourse Markers based Approach - Notebook for PAN at CLEF 2011

This paper aims to explain the performance of plagiarism detection system which can detect External as well as Intrinsic Plagiarism in text. It reports the results on PAN-PC-2011 test corpus. We investigated Vector Space Model based techniques for detecting external plagiarism cases and discourse markers based features to detect intrinsic plagiarism cases.

متن کامل

Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015

The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...

متن کامل

External Plagiarism Detection using Information Retrieval and Sequence Alignment - Notebook for PAN at CLEF 2011

This paper describes the University of Sheffield entry for the 3rd International Competition on Plagiarism Detection which attempted the monolingual external plagiarism detection task. A three stage framework was used: preprocessing and indexing, candidate document selection (using an Information Retrieval based approach) and detailed analysis (using the Running Karp-Rabin Greedy String Tiling ...

متن کامل

Diverse Queries and Feature Type Selection for Plagiarism Discovery Notebook for PAN at CLEF 2013

This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the ...

متن کامل

Approaches for Intrinsic and External Plagiarism Detection - Notebook for PAN at CLEF 2011

Plagiarism detection has been considered as a classification problem which can be approximated with intrinsic strategies, considering self-based information from a given document, and external strategies, considering comparison techniques between a suspicious document and different sources. In this work, both intrinsic and external approaches for plagiarism detection are presented. First, the m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011